A lexico-semantic pattern language for learning ontology instances from text
نویسندگان
چکیده
The Semantic Web aims to extend the World Wide Web with a layer of semantic information, so that it is understandable not only by humans, but also by computers. At its core, the Semantic Web consists of ontologies that describe the meaning of concepts in a certain domain or across domains. The domain ontologies are mostly created and maintained by domain experts using manual, time-intensive processes. In this paper, we propose a rule-based method for learning ontology instances from text that helps domain experts with the ontology population process. In this method we define a lexico-semantic pattern language that, in addition to the lexical and syntactical information present in lexico-syntactic rules, also makes use of semantic information. We show that the lexico-semantic patterns are superior to lexico-syntactic patterns with respect to efficiency and effectivity. When applied to event relation recognition in text-based news items in the domains of finance and politics using Hermes, an ontology-driven news personalization service, our approach has a precision and recall of approximately 80% and 70%, respectively.
منابع مشابه
Variation and Semantic Relation Interpretation: Linguistic and Processing Issues
Studies in linguistics define lexico-syntactic patterns to characterize the linguistic utterances that can be interpreted with semantic relations. Because patterns are assumed to reflect linguistic regularities that have a stable interpretation, several software implement such patterns to extract semantic relations from text. Nevertheless, a thorough analysis of pattern occurrences in various c...
متن کاملOntology Learning by Analyzing XML Document Structure and Content
Most existing methods for ontology learning from textual documents rely on natural language analysis. We extend these approaches by taking into account the document structure which bears additional knowledge. The documents that we deal with are XML specifications of databases. In addition to classical linguistic clues, the structural organization of such documents also contributes to convey mea...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملControlled Knowledge Base Enrichment from Web Documents
The Linked Open Data initiative brought more and more RDF data sources to be published on the Web. However, these data sources contain relatively little information compared to the documents available on the surface Web. Many annotation tools have been proposed in the last decade for the automatic construction and enrichment of knowledge bases. But, while noticeable advances are achieved for th...
متن کاملOntology Enrichment for the Food Traceability Domain Using Romanian Lexico-syntactic Patterns
Ontologies are considered as the most important building blocks of semantic Web. Building such ontologies is a time consuming and difficult task, which requires a high degree of human intervention. In this paper we describe a method to facilitate the enrichment of Romanian language domain taxonomies by using a text-mining approach. We exploit Romanian domain specific texts in order to automatic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Web Sem.
دوره 15 شماره
صفحات -
تاریخ انتشار 2012